Goto

Collaborating Authors

 Bakersfield


The Perverse, Tender Worlds of Paul Thomas Anderson

The New Yorker

The filmmaker behind "One Battle After Another" specializes in stories about people who are cut off, adrift, desperately seeking connection. His films are studies of American loneliness. The director plunges us into the physical realization of experience with a thoroughness that can be unsettling. What is the sound of a needle entering fabric? Something more significant, it seems, than the sound of one hand clapping. You hear a tiny pop followed by the rustle of violated muslin--a shudder in the silence of the universe. Scrupulous directors make sure that the sound of their movies is grossly efficient, so that the dramatic meaning of a scene is apparent even in the worst theatre or home system in the country. They also layer in, for those who care about such things, a secondary level of sound--think of the swishing skirts in Martin Scorsese's adaptation of Edith Wharton's "The Age of Innocence." In " Phantom Thread " (2017)--the needle-and-fabric movie--the director, Paul Thomas Anderson, uses such details to build an exquisitely perceptible epic of minute events.



Rare, deep-sea encounter: California scientists observe 'extraordinary' seven-arm octopus

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Rare, deep-sea encounter: California scientists observe'extraordinary' seven-arm octopus On November 6, 2025, MBARI Senior Scientist Steven Haddock and researchers in MBARI's Biodiversity and Biooptics Team observed a seven-arm octopus (Haliphron atlanticus) during an expedition in Monterey Bay with MBARI's remotely operated vehicle at a depth of approximately 700 meters. This is read by an automated voice. Please report any issues or inconsistencies here . California scientists captured rare footage of a seven-arm octopus eating a jellyfish.


The Most Dangerous Genre

The New Yorker

Our obsession with deadly game shows--from "The Running Man" and "Squid Game" to MrBeast's real-life reënactments--reflects a shift in the national mood to something increasingly zero-sum. It seems we can't get enough of game shows in which the losers die. "The Hunger Games" became a multibillion-dollar media franchise over the past decade, with audiences returning to the theatre, time and time again, to watch adolescents try to kill one another in an enormous arena--a contest devised by the leaders of a society rife with inequality. Netflix's " Squid Game " followed four hundred and fifty-six desperate individuals into an underworld where they play lethal versions of children's games in the hope of winning a life-changing amount of money. Four weeks after its release, the show had become Netflix's most-watched series ever; to date, the first season has been viewed more than two hundred and sixty-five million times.



The study of short texts in digital politics: Document aggregation for topic modeling

arXiv.org Artificial Intelligence

Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.


UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models

arXiv.org Artificial Intelligence

Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, such as Wikipedia. Existing methods focus heavily on using complex mechanisms and extensive model tuning methods to model the multimodal interaction on specific datasets. However, these methods overcomplicate the MEL task and overlook the visual semantic information, which makes them costly and hard to scale. Moreover, these methods can not solve the issues like textual ambiguity, redundancy, and noisy images, which severely degrade their performance. Fortunately, the advent of Large Language Models (LLMs) with robust capabilities in text understanding and reasoning, particularly Multimodal Large Language Models (MLLMs) that can process multimodal inputs, provides new insights into addressing this challenge. However, how to design a universally applicable LLMs-based MEL approach remains a pressing challenge. To this end, we propose UniMEL, a unified framework which establishes a new paradigm to process multimodal entity linking tasks using LLMs. In this framework, we employ LLMs to augment the representation of mentions and entities individually by integrating textual and visual information and refining textual information. Subsequently, we employ the embedding-based method for retrieving and re-ranking candidate entities. Then, with only ~0.26% of the model parameters fine-tuned, LLMs can make the final selection from the candidate entities. Extensive experiments on three public benchmark datasets demonstrate that our solution achieves state-of-the-art performance, and ablation studies verify the effectiveness of all modules. Our code is available at https://github.com/Javkonline/UniMEL.


DataComp-LM: In search of the next generation of training sets for language models

arXiv.org Artificial Intelligence

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.


Cellular Traffic Prediction Using Online Prediction Algorithms

arXiv.org Artificial Intelligence

The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly accurate and adaptable prediction models to optimize network resource allocation and management. This paper investigates the efficacy of live prediction algorithms for forecasting cellular network traffic in real-time scenarios. We apply two live prediction algorithms on machine learning models, one of which is recently proposed Fast LiveStream Prediction (FLSP) algorithm. We examine the performance of these algorithms under two distinct data gathering methodologies: synchronous, where all network cells report statistics simultaneously, and asynchronous, where reporting occurs across consecutive time slots. Our study delves into the impact of these gathering scenarios on the predictive performance of traffic models. Our study reveals that the FLSP algorithm can halve the required bandwidth for asynchronous data reporting compared to conventional online prediction algorithms, while simultaneously enhancing prediction accuracy and reducing processing load. Additionally, we conduct a thorough analysis of algorithmic complexity and memory requirements across various machine learning models. Through empirical evaluation, we provide insights into the trade-offs inherent in different prediction strategies, offering valuable guidance for network optimization and resource allocation in dynamic environments.


RACH Traffic Prediction in Massive Machine Type Communications

arXiv.org Artificial Intelligence

Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequently, there is a compelling imperative to design a lightweight and agile framework capable of assimilating continuously collected data from the network and accurately forecasting bursty traffic in mMTC networks. This paper addresses these challenges by presenting a machine learning-based framework tailored for forecasting bursty traffic in multi-channel slotted ALOHA networks. The proposed machine learning network comprises long-term short-term memory (LSTM) and a DenseNet with feed-forward neural network (FFNN) layers, where the residual connections enhance the training ability of the machine learning network in capturing complicated patterns. Furthermore, we develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network. Simulation results and complexity analysis demonstrate the superiority of our proposed algorithm in terms of both accuracy and complexity, making it well-suited for time-critical live scenarios. We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics. Comprehensive evaluations and simulations indicate that our proposed machine learning approach achieves a remarkable $52\%$ higher accuracy in long-term predictions compared to traditional methods, without imposing additional processing load on the system.